DOCOHENT RESUME 



ED 169 186 



TB OOa 636 



AUTHOR 
TITLE 
PUB -l^ATE- 
NOTE 



DFSCRIPTORS 



Epstein, r.enneth !• ' . 

Sequential Plans ani Fcrm'ative ^valuation* 
[Apr 75] ' / ^ ' • ' • 

25p.; Paper presented at the Annual fleeting qf the 
Anierican Educational Research Associa^tion 
(Washington, D. C., March 30-April 3, 1975) ' / 

MF-$0,76 HC-$1.58 PLUS POSTAGE'^ C \ - 

Audiovisual Instruct^ion'; *Decision Making; 
Educational Alternatives; *F.or[riative Evaluation; 
^ ' ^Instructional Jmprovement ; ^instruct ional Materials; 

Objectively ^Sequential 'approach; Testing; Test 
' ' • . Results; Validity 

iSPSSTPACT ' ' ^ * . ' 

.Sequential plans are suggested a& a basis for 
developing decision rules for accepting or rejecting instructional 
niaterials. ^The technique 'all^ows inferences t-o bd drawn concerning the 
-effectiveness of the instruction for the target - population, from 
formative evaluation sample data. The procedure ^alis for explicit 
statements of the i'^rquiired level of instructional effectiveness'' and 
The amount of error that can^be 'tolerated, in decision' njaking. when 
applied to audio-visual instruction developed for the -^U. S. Army> 
■sequential plans tef^ded to require fever ^udents and provided a much 
deafer framework for making decisions tha,n did th^ traditional^ 80/80 
rule. (Author) ' . ' 



4r 4t 4r 4c 4c 4t 4r 4c 4t 4c 4c ^ :«e 36: * 4 * :«e * :«e 4c « 4c 3|t 4c 3)c 4c 4: 4c :«e « 

* Documents acquired by ERIC i>nclude many informal unpublished , * 
^* materials not ava^ilable from other sources. ERIC makes evor^y effort * 

*. to obtain t he. j^est ^copy available, nevertheless^ items pf marginal * 

* reproducibility are often encountered and *:his affects the quality * 

* of the m^ctofiche and hardcopy ' reproducti^ons EPIC makes availat)l^^^''^^ 
'* via the ERIC Document- Peproduct ion Sorvice (ED?S.). EDSS is hot / * 

responsible f or^ the quality of the original document . Reprodup-xions * 

* supplied by SDRS are the'fapst that can be made from the* ori9lnal. • * 

4c*4c4c«4c4c4c4c4c4c4cj«t4c4c4c4c4c4c4c-4:4:tt t ^*:<}c4c 4ct4c4E4c4c4c*4c4c4c4c 4c4ct^4c^ t4c4c^4c^ 



oo 



/ 



Sequential tlans- and Fonna:fcive Evaluati^^n 

Ktflineth !• Epstein 
^Florida State University^- 



10^JCATI0H»V»eL^A«e 
HATIOHAL INSTITUTE Of 
• tOUCATION 

Vh.s document mas been RCpi^o 

f^urPDEXACTtV AS RECEIVED f«C. 
It^NO .T'?S*rS V.EW OR OP. VO^^^^ 

UnT Of ^•C'AL NAT.ON*L 'N^ ' '^^^^^ °' 
EOUrAT.ON POSITION 0« «>OUlCV 



o 



\ 



A paper presented at the American Educational Research Association annual meeting 

Washington, D.C. Ap?il, 1975 

* Nov at the 1>.S. Ar^y Research Institute -for the Behavioral and Social Sciences 

Arlin,^on, Virginia , / t 



Sequential Plans and Formative •Evaluation ' 



Kenneth I . Epstein 



Tihfs systematic design of instruction requires that data indicating Jthe 

* • 

degree of effectiveness of instructional 'materials be collected and used to 

i ' ■ * 

make improvements in the instructional^inaterials wh^rfe necessary. This^. 

process is .often termed formative evaluation. Baker and Alkin (IQVS), in 

their 'review' of the, state of the art in formative evaluation, focus pri- 
* • . »* -* ^ . • • 

* \ * ' 

ma?ily on the probl^s of what typ^s of dtita -are useful and what one does 

when a .decifion has been made to revise the instruction. The concern of 
^tbis paper is with 'the decision malAng [3rocG3s itself.* 

•Baker and* Alkin (1973^^-3uggest that one of the critical factors in 
judging instructional effectiveness is the extent to which learners master 
t^e objectives* We can cake this one step further and conceptualize instruc- 
tional effectiveness in terms of the extent to which any student ,in the target 
population is Jikely to master the pbjectives, given this^ opportunity . This 
is exactly what J3 implied^in the 80/80 criterion \!fften applied to instruc - 
tional development efforts'/ 8^/80 irrrplies that the instruction will be 
considered effective if at least 80% of .the students' who- begin the instruction 
complete at least 80% of the 'objectives on the first- try. ^ 

Let us assume, for .the moment, that the 80/80 criterion is reasonable for 
a particular instructional development effort . That is, we realize that all 
of the students who begin the instruction 'may not succes'sf ul]y complete 80% 
of the objectives on tha first try, but "that we. will be satisfied if 80% of 



* them 'do- Let us a|so assume that valid ami Reliable measures of the^ ob- 
j^ctives'are available-' How dd-we' dotermiRe if « the -instruction i? accept- 
able In its oresent state? The obvious answer is to find ^oine^ people 'Who 

' - are members of Ihe population for which the instruction is*- intended , .let ^ 

\* , , ' • 

othein try-' the instructional materials, and 'find out how .well they perform* • 

on a test of the objectives. This obvious answer has a^ least one difficult 

question and one- serious pptential inadequacy-^sociated wi!:h it. Th'e 

•difficult qti^stlon is: ^^How many people do we need ^in the tryout; group?^^^ 

The potential inadejquacy lies in the second half ' of the 80/80 criterion, , ' 

80% of the objectives. We will deal with the second probTeni first ' ^ 

Table •! contains the results of a hypothetical tryont of instruc'tion^il ' 

material that technically meets the 80/80 criterion. Five' students arc 

tested on each of the. /ive objectives of the instruction. A signifies 

that the test was passed, a "0" that .it Was failed , . , 

* ' . r • ' 

• " OBJECTIVE 





' 1 


2 


3 


4 


5 


Total 


• % 


1 


1 


•1 • 


1 


1 


■ 0 


4 


80 


2 

i 


1 


; 


•1 




0 


4 


80 




. 0 


1 _ 


' 1 


. 1 


1 


• 4 0 


• 80 


STUDENT ' 

■ 4 


1 . 
1 


1 


1 


1 


0 


'4 


80 


5 


1 


• 1 


1 


0 


4 


80 


^otal 


4 


5 


5 • 


5 


■ 1 . 






* 0/ 

/o 


80 


100 


100 


100 


20 







Table 1: !{ypothetical TVyout Data . 



Looking ^at these results, we find that dll five' stu(\ents completed at- 
least 4 out of\,5, or '80%, of the objectives. 'Jhus. it seems that we^have \ - 
exceeded the 80/80 criterion, apd,^ in fact-, have achievec! a level of 100^80'. ' 
However, these encourpging results have- been obtained at the^. expense of 
objective 5. Only one of the' five students completed objective 5. Can 
this possibly b^ considered effective instructi-on?- .f maintain that it 
cannot. In formulating decision making procedures it is potentially 
illogical to interpret criteria such as 80;^ of the students achieving 80% 
of the 'objectives "literally^ Each objective' must be evaluated independently - 
Xhe data in Table 1 are not indicative of acceptable instruction.' The 
material pfertfeining to objective 5 must be revised. 

Treating objectives independently is not only moreTLogical than attempt- 

ing to collapse over pbjectives, it also allows for greater flexibility. For 

' . ' t • . • / 

example, supposeHhat all objectives were not equally critical- One could 

then attach different' criterion" levels to each objective 'depending on its 
, importance, instruction for absolutely essential skills might require that 
95%'or even 100% of the students accomplish tht^ objective, while '^nice to- 
know" or interesting information might have tjelativeXy low cri-terion levels. 

Considering the need to treat objectives independently, the 80/80 criterion 
rule as originally suggested must be revised. Retaining the requirement that • 
the instruction be effective for at legst 80% of the students, the new 
c;riterion rule simply states '^that 80% of the students^must achieve each ob- 
jective.- This leads \)ack to the question of the size of the tryout» sample. • 

Suppose that instruc*tion which 4s effective "ToT^t least 80^^ of the 
students in the target population is desired. The data gathered during a 
.tryput of the instAiction designed to. meet^ this criterion *is used to draw 
inferences about the effectiveness of the instruction for the total target . 



population. ChiJrLy, th<' larger* the sample- of sliuients in the tryout 

group, the better tlit> estimdte of 'iiiotriK-tioiial eff octivfMiess for 'the total 

group will be. An ind ication of. the precision with which population [flva- 

'meters are estimated by sample, data is given by' confidenceflimits . For 

example, in the case of one 'objective; assume that the tryout- sample ,con- 

sists' of five students, four of whom aco^omplish the* objective. The effec- 

tiveneSfc of th^ instruction, in terms of the proportion of/students who 

accomplished the objective is 80?^.. However., the '9S?o' confidence limits for 

g proportion based oh 4 correct answers in 5 trials are b.3'43 and 0.990. 

(These confidence limits assume that a random sample is drawn from an in- 
* . " ^ ' 

finitfely large po{)ulation. Since most instf-uctional development efforts 

involve materials which will be useTuT. for a large number of students and 
•* . . • * 

since students for a tryout should be -randomly sampled from the target popu- 

lation, this ascumption seems reasonable.) The relatively widely separated 

values of the confidence limits imply that we sho;jld be extremely cautious 

in drawing any inferences about the instructional effectiveness- fqr the 

total- population from a tryout sample of five. ■ Unfortunately, increasing 

the sample size while staying within tl\e bounds- of practical constraintsx^ 

doesn^t help ihuch. For example, the ^5% confidence limits for propoi-tion 

for observing 8 correct in 10 trials are 0.397 and 0.963; for 16 correct 

in 20 trials, 0.589 and 0. 929 ;• f or '.24 correct in 30 trials, 0.636. and 0.909; 

for 40 correct in 50 trials, appro.ximately 0.67 and 0,90; and for 80 correct 

in 100 trials, approxijmately 0.71 and 0.88. ^ 

Sequential testing is an alterr^ative to tryout \t-ttT3t ions where the 

sample size is fixed before the tryout. begins? and no decision is possible 

until all the data have been analyzed- The use of a sequential* testing 



6 



strategy takes advcir^aj^i.' of the fact thci^t a very good product or a yec>yp>: 
poor ' product can be expected to rc\^ea"l iXz, charact'er uh(Mi"oii!y a small 
sample i:^ tested, and that more • extensive sampling is "only necessary for 
products of borderline <\\\fiity. In general, the sequential testing strategy 
calls for observing one sample item^at a tin^e with the possibility of a 
decision "about ti^e totni population after each observatiors^^^^^Jhat is, sample 
items are drawn randomly, one at a. time, from the population,- and, based on 
observations of that sample, .thq total ^population is l) accepted, 2) reject- 
ed, or 3) no decision is made. For the ,case where an acceptance or rejection 
decision cannot be made, another sample item is selected and the decision" rul^^ 
applied again. The 'process continues until cfn acceptance or rejection 



decision can be reached , " 

* In terms of formative- evaluation, the total population consists of all 
students for .whoip. the instruction is^ intended. One- sample item corresponds 
to one student chu^^en at random from the target population. The performance « 
of each student after completing the instruction is used to predict hoiv well 
any other student in the target population would do, were he exposed to the 

• instruction. • * „ ^ 

' (jnd^ the SO^rcriterio'n rule the instruction will be >Qonsidered effective 
if at J'east 80% of the students in" tli^arget population accomplish the ^ 
objective. This criterion may also be interpreted as the probability that: 
any randomly choben student -will' accomplish the' objective, thjit is, 0.80. - ^ 
In other words, the performance of a randomly chosen student may be considered 
a Bernoulli variable; with the probability of success being equal' to .instruction- 
al effectiyeness* An acceptance decision or a ^rejection decision can bg made 
when sufficient evidence to craw inferences about the instructional effective- 



* . . . / ' * V 

nesS has b^en gathered r 



Four [HiraiTU'tcrs iwo rcMiuir<H! to ueveloj) o sei|U(^tial touting plau. Tx>o 
df the parAmietors are related to t! • required level ul* insti'uctional et'fec- 
tiyeness They will be deisignaled p-,^ and p^ . The othei? t\vO paranic^cers , 
OC and /3 , are related to the amount of error in decision- making that can 
be t'olerateJ. y ^ ' . • 

^ is. determined by answering the following question: \vhat percent 9f 
high quality instruction can be" erroneously rejected?^' ^ is determined by 
answering a similar question: Hv/liat percent of low quality Instructicm can 
be erroneously accepted?^^ . . ' 

Notice tnat 1?he model is bas.ed .on the fact that absolute accuracy In 
decision making can never be achieved when decisions are', based, on sample 
data,. The strength of the model lies in the explicit statement of the allow 
able error. However, the possibility of error also implies that an area 6f 
indeeisi9n exists. This means that we cap- nd longer demand point estimates 
of instructional effectiveness » Rather, we must define what- is meant by 
unacceptably low quality and unquestionably high 'qua! ity . The area in be- 
tveen low quality and high quality., is, in ^effect', an area of indifference or 
indecision. For example, if we want to restate the 80% criterion in terms 

N • • • " 

useful to sequential test^r^g? might say that instruction which is effea- 
tive foe 90% jof the target population is of unquestionably high quality but^ 
.that . instruction which/ is effective, for only 70% of the target population is 
certainly undcceptable. Instruction whif^h ^is effective for between 70% and 
90% of the target population we 'are indifferent about. It may or may not ; 
be Acceptable. The parameters p^ and specify explicitly what is meant by 
'Tiigh quality*' ari^l **lo^ quality ^^ By varying the values of p^^ and p2, bt 
and it is possible to specify in great detail exactly what is required 



oX^n instruct ioniJ I pro^r.fin. TheuiuiJol is n^so readily adciptable to more. 
f-^Loborate '.^stoins incorporating difiV^ntidl loss fuTu-tions. 

The original dev'elo[muc'nt uf -the inatliematics* for soqut-tntial tcJ^ting 
strategies uas, described b^--A. Wald in his book Sequential Analysis (John 
Wiley* & Son, 1947). The equations which follow were adapted, from the dis- 
cussion of sequential testing in Lindgr'^^n and McELrath (1966) and Crow, 
Davis a\\d Maxfield (I960)'. o . . ' ^' 

The sequential likelihood, ratio test i$ designed for testing between two 
simple hypotheses which, in t\\e ^ase of a Bernoulli population, can be 
written: H]^ ' : p = p^ . . . 

* : P = P2 . ■ i ' . 

where ;p^ = 1- the effectiveness of high quality instruction and P2 - )'^^ 
the ef^ctiveness of low quality instruction. In*other words, p represents 
the probability that a student will be unabli^ to accomplish the objective 
when th& instruction is of unquestionably high quality, and pg represents 
the probability that a sjzudent^wi^ .be unable to accomplish the objective 
when the instruction is of unacceptably low quality.^ 

The test, is based on the value of the likeHjibod ratio computed after 
each" observation, including all of the observations obtained up to that point 

. ^ ^Pl^ p^ Pi) 

where X n -* the likelihood ratio 

k ri-k . 

* L (P5) = P2 (1 - P2) = "^^^ likelihood' 6f 

' k n-k 

• L (pi ) = p, (1 - p, ) - the likelihood of p 

/ 1 \ 1 1 ^ . J 

\ ^ 
^and n = the number^of observations, and 

"•k,.= the .number of successes ► 



T\sO valiH^o, A 'iincl W are cho.^en such that if (1) A n <C ^^^'^ hypothesis 
H, : p = p is chosen, (2) A ^ > BV tlien* hypothesis H.. - P ^* Po 
chosen, and (3) A ^ X < tlrcn a/iothcr item. is chosen and another 
observation is made. Good approximations of A and B in terms of the tvvo 
types of decision making error, /f.c< and, are specified as . » 

(2) A = -^/(l-^ ) 

^ r a- ^ )/ ^ ■\ ; - ■ . 

Combining equations (1) and (2) the inequality above may be written: 

(3) - ^ . P2 P2 ).""^ • (l- ^) 

0- ) • (1 - Pi) "-^ ^ • . 

In order to simplijfy the computations, we take the logarithm of each term 
in, and rearrange termsaso-that (3) is. expressed in terms of k, the 
number of successes. 

(4) - Ibg ^ /3 - j < k.log 1^ j+ (n-k) log ^^2j < log| ^l-^ 



(5) log 




(6)- log/ ^ \ - n log/ ^"P2 I < k^og/'^l - log 

-n lo/^-P2 
\l-p,.. 

Finally, we take advantage of the fact that the extretne's of the inequality 
in (7^ are linear functions of n,. compute them, and graph them at the 

iO 



f 



beginning of Iby lest. The va]ue of k is llien plotted as^,the lest" "pro- 

/ . * 

coeds, and uhen it crosses one x>f the two straight lines, the test stops 

and a decision is arrived at. ' * . 

Simple compiiJ;ational equations for the p^o straight lines are derived 

as follows: / ^ w* A'- o \ 

(8) = P3 / • (left, extreme (7)) 



d2 = ^ \ '^S / (right extreme (7)) 

^ogfpj_\ -log ■ , 



.(.) g,= .o,^^^ g, = -Iog 



ERLC 



h^ = b/v (p,^ go) h^ = a/ (g^^ + g^) . . ' s = g^/ (g^. g^) ^ 



(10) d. = -h. sn (lower line) - ^ 

■^2 ~ ^2 ^ -^^ (upper line) p 

/ 

In order to illustrate the operation of sequential 'testing strategies, 

an exarpple "will be carried out in detail. Assume that instruction which is 

effective for approximately 80% of the students is desired. High quality 

instruction will be defined as instruction which is effective for 90% of 
* • 

the students . :> is then 1,0-0.90 = o:iO. Lov. ouality instruction will be 
I s • . ^ 

(defined as instruction v;hich is effective for only 10% of the students. 



o 11 



p,^ is thru 1.0-0.70 Thr fn.structioii<il lirvcloiier' must Ihon decid*^- 

how^.uch dpc-ision making cvrov .{-ju b*' tolor;jtrd. ! o:^ this expnple, assume 
that it is^rel«tiv(>ly i-oslly to ^erronpoiisly re.ject high (juality instruction. 
.Thus, <i' rtMsonable value for ^ , thx* probabilii:y of rejecting instruction 
that is effective for 90:3of.the target populafion, might be O.Ol/ 
^ In many ca<^ps, the cost of erroneously accepting fhstruction that is 
of lower quality than .that desired ma^ be less than the cost of erroneously 
rejecting high quafity instruction. The reasond for this Ipver cost, vary 
from case to case, but revolve primari'ly ar )Ui..l the fact that the instruction 
can always, be improved if it is found to be un^^cceptable in practice. Under 
these circumstances a reasotiablfe value for , the probability of erroneous 
ly accepting instruction '^that i^ effective for "^O;^ of the students, might be 
O.IO, 'The values for ^ and refer directly to the 'values of p^ and 

Th^ probability of rejecting instruction the^t better than the instruc- 

tion d'efined 'as high quality will always be less than 0( . Similarly, the 

probabilit>- of accepting instruction that is i^ss effective than the iinstruc- 

tion defined as low quality will alv;ays be .less than . In other words, 

.O^ and 'represent the decision making error that is tolerable for 

the worst possible case, if the instructional ef fc^^tiveness is equal to 

p • oif p . If the instruction is better or worse than the specified values- 

of effectiveness, the errors in decision making will be less than or 

. ' . 

/!i . Given these values, p = 0.10, p^ = O.3O, o< - 0.01, - 0.10 

1 

it is possible to use the computational equations in (9) arid (10) above to 
generate a sequential testing plan, tlie computations are shown in T<^ble 2.^ 
The first step in using the sequential plan is to plot lines d and d 



11 



Necessary Paran.eters: p = .10 p« = ,30 = .01 X? = .10 
. . 1 ^ 

Calculations: = log Pi'?-^ = leg .30/. 10 = log 3 = .477 



g2 = - log 1-p^ - log ^ = - I05 (.7778) = .109 

1-p, .90 

a = log (1 -^^M = log .90/. 01 = log 90 = 1.954 



b-= - log = - log = - log (.1010) = .996 

l-\ .99 



\ ='b/(g^ + g^) = .996/ (.477 + .109) = .996/. 586 = 1.70 



^2 " ^^^^1 = l-954/(.477 + .109) = 1.954/. 586 = 3.33 

s = g2/Cgi +^82) = .109/ (.477 + .109) = .109/ .586 = .186 



dj_ = - h]_ + sn = -1.70 + 186n 



Table 2: Calculations for Sequential Plan to Evaluate Hypothetical 
Example 



13 



12 ... 

(Table 2). Tlie in.slructiunal dovtflopof then administers the test of the .objec 
tive to the first randomly .chosen student. If h» passes, a point one unit 
Ao the. right of the zero point is plotted on the graph. If he fails, a point 
one unit to the right and one unit up from th<* zero point is. plotted . The 
instructional develigper then checks to see if he has cropsed into either the 
acceptance region or the^ rejecTion region. If he has, the appropriate 
decision is made and further testing is unnecessary. If no decision can 
be made, another student is randomly selected , tested , and the same procedure 
is followed, starting at the previously plotted 'point rather than at zero. 
The procedure continues until a decision is reached. Table 3 contains hypo- 
thetical test results and subsequent action^ by the instructional developer. 
Figure 1 shows the sequential ^lan basod^ on the calculatioi:? in Table' 2, . 
which guided the decision making. Th^ numbers which fall on the graph 
correspond to the student numbers in Table 3, Ttie result of ' this hypotheti- 
cal example is that the instruction is accepted with no immediate need for 
revision, based on a tryout sample of 15 students.' ' 

An alternative to the above plotting procedure is available, .particularly 
if student test data can be gathered on a comput-er. The procedure*- simply 
calls for calculating the values of the extremes in equation (7). After 
each student attempts the test, the value of k, the to Lai number of .successes 
is compared to the values' of the extremes of the inequality. If k """s less 
than the , value of the left hand extreme, reject the instruction. If k is' 
greater than the value of the right hand extreme, accept the instruction. 
If k falls' between the extremes, continue sampling. 

An obvious concern with sequential testing is: will the procedure ever 
terminate or will we forever remain in the region of indecision? Lindgren 
and McEIrath (1966) state that Wnld has shov,n-that the test will terminate 



Sttjdent Number Test Results Action 



0 



^ k 

5 
6 

7 

8 

' 9 

10 

11 

12 
13 

15 



Pass 

Pass 

Pass 

Pass 

Pass 

Pass 

Pa!?s 

?as3 

Fail 

Pas.; 

Pass 

Pasf: 

Pass 

Pas? 

Pass 



Begin a^ O'/O 

Move 1 unit right 

Move 1 unit right 

Move 1 unit rig^:t 

I'fove 1 ^onit ,right 

Moye 1 unit right 

Move 1 unit right 

Mov^ 1. vjiit righ^ 

Move ^ unit right 

Move 1 unit rigiit 

ahc 1 unit, up 
Move 1 unit right 

Move 1 unit ri^t 

Move 1 unit^ right 

Move 1 unit right 

Move 1 unit ri^jht 

Move 1 unit righ^c 



Begin sampling 

No decision^ 

. continue sanpling 
No de*cision: 

continue sampling 
No decision: 

continue sanipiing 
No decision: 

continue sampling 
No decision : 

continue sampling 
No decision: 

continue sa^^ipling 
No decision: 

continue sampling 
No decision: 

continue sampling 
No decisions- 
continue, sampling 
No decision : 

continue s^impling 
No decision: 

continue sampling 
No decision": 

continue sampling 
No decision; 

continue sampling 
No decision: 

continue sampling 
ACCElf INS?PUCTION: 
STOP 



:able 3: Hypothetical student test data used to implement a sequential 
'testing plan » 



/ 



15 



,with probabQitV^nc. Fui-ther, they point out that the number of obser- 
'vations required to'reach a decision luVs an expected value that rs usuall'y . ' 
less than the number of observations requirecJ to reach a decis.ion with the 
same precision but using a fixed sample size. .Crow, Davis and Maxfield 
• (1960) discuss the use of truncated sampling plans to prevent the possibili- 
ty of requiring a very large sample. Referring to equation .(9), we .agree" 
to stop Sampling when n = (3ab)/(g^g^) (for the hypothetical example above 

4 

this value equals ^Approximately 112)-.' fhis procedure "results in negligible . 
changes iji the v?luea-,of 6( and y3 . If n, the'Tiumber of samples, gets 
this large with no decision, we accept t^e instruction, provided that t\\e ' ' 
vertical distance from the n^^ point to the lower line is less than the' ' 
vertical distance from the^n point^to the ^pper line. Otherwise , we re- 
ject the instruction, . . . ^ 

> 

Baker and Alkin (1973) discuss the problems" dissociated with empirically 
evaluating the usefulness of formative evaluation procedures. The sequential 
testing strategy suggesteT-iti this paper is as difficult to evaluate as other 
procedures.: The major problem in judging the strength of any decision 
making procedure is the usuql lack of jsuitable external criteria agains't 
which to compare decisions. However, some evidence that a sequential ttest- 
ing strategy is'at l6ast a's useful as other procedures does exist. Two 
^examples will be discussed. 

The U. S. Army has beer; heavily involved in the design .of audio-visual " 
instruction to teach a. wide variety of skills. Formative evaluation data 
was available For instruction to teach^ land navigation. The instruction 
covered eight objectives, each objective having associated with it a perform- 
ance test that was scored pass/foil. Twenty-eight students participated 



1? 



16 



the formativo'evaliuitioii tryout. h'he ti'yoat' data is shoun in Table 4. i 



Objective Number . Percent 95"o Coiif idenct? limits 
Number _ Passing Passing . for proportion 

1 , 27 96 0.830 • .. 0.998 

2 ■ . 26 93 0.783 0.987 

3. * 26 93 0.783 0.987 ' 

4 • 25 • '■ 89 0.741 • 0.970 

5 26 93 0.783 ' 0.987 

6 26 93 0'.783 (5.987 ' , 

7 20 . 71 , 0.537 0.858 

8 • ■ 16 57 0.381 0.742 
Overall* • -26 93 0.783 ,. 0.987 ' • 

* The overall criterion for passing was at least 6 of 
the 8 objectives accomplished. * 

Table 4: Results of U. S. Arjny Formative Evaluation -for 
Audio-visual Instruction in Land Navigation, 
n = 28, 



The decision rule used' to evaluate this instruction was the 80/80 
rule: ^80?^ of the students pass -80;^ of the objectives. The 95% confidence 
limits imply that the instruction vjas certainly acceptable for objective 
tfl, and that relatively high confidence can he placed in the effectiveness 
of the instruction' for objectives xi'^^2^^Ayt>^ and 6. The effective for 
objective ^7 is questionable , and the instruction for objective tt8 is cer- 
^ tainly below the m'inimum requirement * However, the overall^data imply that 
the instruction may be accepted without further revision. 

A sequential testing strategy was applied to the same data. The values 
for P-^* > p^? ^ • ? ^ were the same values used in the hypothetical' 

e:^ample discussed earlier. The results of the sequential testing procedure 
'are summarized in Table 5. Figures 2a tlirough 2h show the* pjK)tt;ed data. 
Since there was no reason to believe that the student^ were arranged in any 
particular order, the results from student number 1 in the Army tryoiit werQ 







f 

\ 




17 


> 


tliL* results from 


•student 


nunbor 2 scjcondj and 


• 






Objective 


Number 


• 


Numbt-r 


Tes.toc] 


Dec ic? ion / 


1 


10 


Accept 


2 


15 


Accept 


3 


■ 15 


Accept 


4 


20 


Accept 


5 


10 


Accept ^ 


6 


20 


Accept ' " 


7 


17 


Reject 


8 


. 6 • 


Reject 



Table 5: Results of Sequential Testing for* Formative 
.Evaluation Data from Army Audio-visual In- 
|5truction in Land Navigation 



vSincV I have argued that it is inappropriate to collapse across objec- 
tives, the overall data was not evaluated. The results of the sequential 
testing procedure agree with the rosullls obtained using the 80/80 rule 
with 28 subjects; that is, we accept instruction for objectives ^\ through 
#6, revise the instruction for objectives #7 af!d #8. In all cases, fewer 
students were needed than when' using the 80/80 rule. In fact, the results 
for objective #8, clearly the objective for which revisions are most needed, 
were obtained with only 6 students. 

Mitchell (19.74) reported the results of using sequential testing for de- 
cision^ipaking during the. development pf instruction using ' interactive com- 
puter* simulation. Four prototypes of the instruction were tried out and the ' 

results evaluated using a sequential^ testing strategy. Mitchell^s values 
•* * 

'for the fouf necessary parameters were p = 0.80, p .= -0.50., *X = 0.01, 

= 0.20, Mitchell rejected the first three prototypes after- only five 
studehts each attempted the instruction. The final version of the instruction 



ERLC 



19 



18 * . \ 

/ 

wils accoplcMl oil tlio b.iois of datii from fouV students. Other data collect.ed* 
as the students worked through the instruction itst'lfy supported the rejoc- 
tion and acceptance decisions readied on the basis of the sequential testing. 
Mitchell^s conclusions concerning the usefulness of the procedure ore very 
encouraging: ' ' * . • - 

"The sequential plans technique worked optimally for t^is validation. 
It did at the outset hold the promise of being more subject -efficient than 
earlier techniques and this was one of the' reasons for its utilization^ (the 
oth€?r reason was the risk A)nsiderations included which ane not normally 
found in the ^80/80 criterion' or similar techiiiques). " (p:. 25) 

The major impetus behi^nd this paper was dissatisfaction -with commonly 
available decision making procedures for formative evaluation. A need exists 
for explicit iniijrmation describing what is required of instructional de' ""op- 
ment efforts, "It' is also necessary to specify t^ie risks that may be tpler- 
ated in. accepting instruction as being ready for use or rejecting it and be- 
ginning the revision procedure. These requirements are particularly acute 

when the developer and the user are not the same people. Sequential testing 
f 

A 

Strategies may hi}lpito make more, explicit the formative evaluation process^ 
At^the same time, it may help to solve some of the resource demands <of form- 
ative evaluation by decreasing the number of subjects required ^or^the try- 
out sample. More experience will be required to learn how the procedure can 
best be implemented and to determine the value of the decisions made, \Vhat- 
ever the outcome ^6t future research , the mere, fact than an attempt is made to 

deal with form^ative evaluation in a» objective manner will hopefully lead to 

« 

improved instruction. 



20 



I- 



1.; . I 



r- I- r ; 



;:Utt:LH|. 

M-ft*iv f-r- 
^^ ri i -rVr r 



va s J '2 c- 



; I. wi- 



_:.4'^> 



-0*0 



rtrr:.Li::.L 



x-::-u Si' 



.i-U. 



Lrrr 



TAirvT dudib -visual i n'jt 



f4f 



t --r 

r r->* - . — 



f— — " 

I . .. ,. 




i. L ^ r*^ - 
r * , 




t- - — — 

•irp. ; 




1-"-?- ► 


J t - t- 


r — rr *"■ 


i J r : 


l-_L 7". . 




; 




r~.:i: 




'1. , x- 










■la 







-t-r— r-'-r 

,trfr.1t::t- 



r r r ^ tt 



--- -INS f-R-'j t'--i} 



rfi:] ^ 
! I ; i ! , I ^ t 



' ■ t 



A L- s -r -.- 



c' c- k-p T—-'''^ 



-l-r 1— 



3 M ? L-I-N -G 



:hr:ri:rd±; 



N 



rJ-:. 



_tr_:r1?i-:::Li 



1 K 

"i--r:::-T::--u- 



1 1 



?; u N n B n 



O P T R •! ■/ 



:::j:LU- 



Anay ^udio-ylsual uvitruction objective 

21 




m ■ 23 . 



I • I 1 I I f I. I 




r 1 3 i T I • ! 




■1-5- 



.l-|-K.-,-i.U.-. 



5T[:[:ibt 

•i-HtFrtt-f- 



TTT J. r ! 1 - 



■r»' — I r 



rdc tich ; CT? je i)* iv.c., r;jr 



1 i 



: ■ ■ ^ : I'M - 



. ■ 4 I • 




r-»'t 



■ft 

■Ett 



22 



t9 



<f -it,-ui-« 2h; Anny't-udio-viGunl in^.truction ohjtctive 8 



23 , 

References ^ 

■ s - 

« 

Baker, E. L. & Alkin, M. C. Annual review paper: Fo/mative evaluation of 

instructional development. Audio-Visual Communication Review , 1973, 

2a, 389-418. • *, 
Crow, E. L., Davis, F. A , & Maxfield, M-' W. Statistics Manual . New ^rk: 

Dover Publ^ications, Inc. 1960. 
Lindgren,^B. W. & McElxath, G- W. Introduction to ProbaMlity and Statistics . 

New York: The Macmillan Company, 1966. , 
Mitchell, M. C, Jr. The systematic design and validation of an interactive 

instructional computer simulation. Unpublished manuscript, Florida 

State University, 1974. • 



/ 



) 



ERIC 



